Parsing Biomedical Literature
نویسندگان
چکیده
We present a preliminary study of several parser adaptation techniques evaluated on the GENIA corpus of MEDLINE abstracts [1, 2]. We begin by observing that the Penn Treebank (PTB) is lexically impoverished when measured on various genres of scientific and technical writing, and that this significantly impacts parse accuracy. To resolve this without requiring in-domain treebank data, we show how existing domain-specific lexical resources may be leveraged to augment PTB-training: part-of-speech tags, dictionary collocations, and namedentities. Using a state-of-the-art statistical parser [3] as our baseline, our lexically-adapted parser achieves a 14.2% reduction in error. With oracleknowledge of named-entities, this error reduction improves to 21.2%.
منابع مشابه
Extracting Higher Order Relations From Biomedical Text
Argumentation in a scientific article is composed of unexpressed and explicit statements of old and new knowledge combined into a logically coherent textual argument. Discourse relations, linguistic coherence relations that connect discourse segments, help to communicate an argument’s logical steps. A biomedical relation exhibits a relationship between biomedical entities. In this paper, we are...
متن کاملTowards Cross-Domain PDTB-Style Discourse Parsing
Discourse relation parsing is an important task with the goal of understanding text beyond the sentence boundaries. With the availability of annotated corpora (Penn Discourse Treebank) statistical discourse parsers were developed. In the literature it was shown that the discourse parsing subtasks of discourse connective detection and relation sense classification do not generalize well across d...
متن کاملBioPPIExtractor: A protein-protein interaction extraction system for biomedical literature
Automatic extracting protein–protein interaction information from biomedical literature can help to build protein relation network, predict protein function and design new drugs. This paper presents a protein–protein interaction extraction system BioPPIExtractor for biomedical literature. This system applies Conditional Random Fields model to tag protein names in biomedical text, then uses a li...
متن کاملExtraction of Gene/Protein Interaction from Text Documents with Relation Kernel
Even though there are many databases for gene/protein interactions, most such data still exist only in the biomedical literature. They are spread in biomedical literature written in natural languages and they require much effort such as data mining for constructing well-structured data forms. As genomic research advances, knowledge discovery from a large collection of scientific papers is becom...
متن کاملMining Protein Interaction from Biomedical Literature with Relation Kernel Method
Many interaction data still exist only in the biomedical literature and they require much effort to construct well-structured data. Discovering useful knowledge from large collections of papers is becoming more important for efficient biological and biomedical researches as genomic research advances. In this paper, we present a relation kernel-based interaction extraction method to extract know...
متن کاملAn Unsupervised Text Mining Method for Relation Extraction from Biomedical Literature
The wealth of interaction information provided in biomedical articles motivated the implementation of text mining approaches to automatically extract biomedical relations. This paper presents an unsupervised method based on pattern clustering and sentence parsing to deal with biomedical relation extraction. Pattern clustering algorithm is based on Polynomial Kernel method, which identifies inte...
متن کامل